|
In Unicode and the UCS, a compatibility character is a character that is encoded solely to maintain round trip convertibility with other, often older, standards.〔(【引用サイトリンク】url=http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf#G11062 )〕 As the Unicode Glossary says:
Although ''compatibility'' is used in names, it is not marked as a property. However, the definition is more complicated than the glossary reveals. One of the properties given to characters by the Unicode consortium is the characters' decomposition or compatibility decomposition. Over five thousand characters do have a compatibility decomposition mapping that compatibility character to one or more other UCS characters. By setting a character's decomposition property, Unicode establishes that character as a compatibility character. The reasons for these compatibility designations are varied and are discussed in further detail below. The term ''decomposition'' sometimes confuses because a character's decomposition can, in some cases, be a singleton. In these cases the decomposition of one character is simply another approximately (but not canonically) equivalent character. == Compatibility character types and keywords == The compatibility decomposition property for the 5,402 Unicode compatibility characters includes a keyword that divides the compatibility characters into 17 logical groups. Those characters with a compatibility decomposition but without a keyword are termed canonical decomposable characters and those characters are not compatibility characters. Keywords for compatibility decomposable characters include: <initial>, <medial>, <final>, <isolated>, <wide>, <narrow>, <small>, <square>, <vertical>, <circle>, <noBreak>, <fraction>, <sub>, <super>, and <compat>. These keywords provide some indication of the relation between the compatibility character and its compatibility decomposition character sequence. Compatibility characters fall in three basic categories: # Characters corresponding to multiple alternate glyph forms and precomposed diacritics to support software and font implementations that do not include complete Unicode text layout capabilities. # Characters included from other character sets or otherwise added to the UCS that constitute rich text rather than the plain text goals of Unicode. # Some other characters that are semantically distinct, but visually similar. Because these semantically distinct characters may be displayed with glyphs similar to the glyphs of other characters, text processing software should try to address possible confusion for the sake of end users. When comparing and collating (sorting) text strings, different forms and rich text variants of characters should not alter the text processing results. For example, software users may be confused when performing a find on a page for a capital Latin letter ‘I’ and their software application fails to find the visually similar Roman numeral ‘Ⅰ’. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Unicode compatibility characters」の詳細全文を読む スポンサード リンク
|